468 research outputs found

    Integrating a Non-Uniformly Sampled Software Retina with a Deep CNN Model

    Get PDF
    We present a biologically inspired method for pre-processing images applied to CNNs that reduces their memory requirements while increasing their invariance to scale and rotation changes. Our method is based on the mammalian retino-cortical transform: a mapping between a pseudo-randomly tessellated retina model (used to sample an input image) and a CNN. The aim of this first pilot study is to demonstrate a functional retinaintegrated CNN implementation and this produced the following results: a network using the full retino-cortical transform yielded an F1 score of 0.80 on a test set during a 4-way classification task, while an identical network not using the proposed method yielded an F1 score of 0.86 on the same task. The method reduced the visual data by e×7, the input data to the CNN by 40% and the number of CNN training epochs by 64%. These results demonstrate the viability of our method and hint at the potential of exploiting functional traits of natural vision systems in CNNs

    Object Edge Contour Localisation Based on HexBinary Feature Matching

    Get PDF
    This paper addresses the issue of localising object edge contours in cluttered backgrounds to support robotics tasks such as grasping and manipulation and also to improve the potential perceptual capabilities of robot vision systems. Our approach is based on coarse-to-fine matching of a new recursively constructed hierarchical, dense, edge-localised descriptor, the HexBinary, based on the HexHog descriptor structure first proposed in [1]. Since Binary String image descriptors [2]– [5] require much lower computational resources, but provide similar or even better matching performance than Histogram of Orientated Gradient (HoG) descriptors, we have replaced the HoG base descriptor fields used in HexHog with Binary Strings generated from first and second order polar derivative approximations. The ALOI [6] dataset is used to evaluate the HexBinary descriptors which we demonstrate to achieve a superior performance to that of HexHoG [1] for pose refinement. The validation of our object contour localisation system shows promising results with correctly labelling ~86% of edgel positions and mis-labelling ~3%

    A Portable Active Binocular Robot Vision Architecture for Scene Exploration

    Get PDF
    We present a portable active binocular robot vision archi- tecture that integrates a number of visual behaviours. This vision archi- tecture inherits the abilities of vergence, localisation, recognition and si- multaneous identification of multiple target object instances. To demon- strate the portability of our vision architecture, we carry out qualitative and comparative analysis under two different hardware robotic settings, feature extraction techniques and viewpoints. Our portable active binoc- ular robot vision architecture achieved average recognition rates of 93.5% for fronto-parallel viewpoints and, 83% percentage for anthropomorphic viewpoints, respectively

    Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting

    Get PDF
    We present an interactive perception model for object sorting based on Gaussian Process (GP) classification that is capable of recognizing objects categories from point cloud data. In our approach, FPFH features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide and probable estimation of the identity of the object and serves a key role in the interactive perception cycle – modelling perception confidence. We show results from simulated input data on both SVM and GP based multi-class classifiers to validate the recognition accuracy of our proposed perception model. Our results demonstrate that by using a GP-based classifier, we obtain true positive classification rates of up to 80%. Our semi-autonomous object sorting experiments show that the proposed GP based interactive sorting approach outperforms random sorting by up to 30% when applied to scenes comprising configurations of household objects

    On the Calibration of Active Binocular and RGBD Vision Systems for Dual-Arm Robots

    Get PDF
    This paper describes a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot. For this purpose, we derive the forward kinematic model of our active robot head and describe our methodology for calibrating and integrating our robot head. This rigid calibration provides a closedform hand-to-eye solution. We then present an approach for updating dynamically camera external parameters for optimal 3D reconstruction that are the foundation for robotic tasks such as grasping and manipulating rigid and deformable objects. We show from experimental results that our robot head achieves an overall sub millimetre accuracy of less than 0.3 millimetres while recovering the 3D structure of a scene. In addition, we report a comparative study between current RGBD cameras and our active stereo head within two dual-arm robotic testbeds that demonstrates the accuracy and portability of our proposed methodology

    Egocentric Perception using a Biologically Inspired Software Retina Integrated with a Deep CNN

    Get PDF
    We presented the concept of of a software retina, capable of significant visual data reduction in combination with scale and rotation invariance, for applications in egocentric and robot vision at the first EPIC workshop in Amsterdam [9]. Our method is based on the mammalian retino-cortical transform: a mapping between a pseudo-randomly tessellated retina model (used to sample an input image) and a CNN. The aim of this first pilot study is to demonstrate a functional retina-integrated CNN implementation and this produced the following results: a network using the full retino-cortical transform yielded an F1 score of 0.80 on a test set during a 4-way classification task, while an identical network not using the proposed method yielded an F1 score of 0.86 on the same task. On a 40K node retina the method reduced the visual data bye×7, the input data to the CNN by 40% and the number of CNN training epochs by 36%. These results demonstrate the viability of our method and hint at the potential of exploiting functional traits of natural vision systems in CNNs. In addition, to the above study, we present further recent developments in porting the retina to an Apple iPhone, an implementation in CUDA C for NVIDIA GPU platforms and extensions of the retina model we have adopted

    Glasgow's Stereo Image Database of Garments

    Full text link
    To provide insight into cloth perception and manipulation with an active binocular robotic vision system, we compiled a database of 80 stereo-pair colour images with corresponding horizontal and vertical disparity maps and mask annotations, for 3D garment point cloud rendering has been created and released. The stereo-image garment database is part of research conducted under the EU-FP7 Clothes Perception and Manipulation (CloPeMa) project and belongs to a wider database collection released through CloPeMa (www.clopema.eu). This database is based on 16 different off-the-shelve garments. Each garment has been imaged in five different pose configurations on the project's binocular robot head. A full copy of the database is made available for scientific research only at https://sites.google.com/site/ugstereodatabase/.Comment: 7 pages, 6 figure, image databas

    A Software Retina for Egocentric & Robotic Vision Applications on Mobile Platforms

    Get PDF
    We present work in progress to develop a low-cost highly integrated camera sensor for egocentric and robotic vision. Our underlying approach is to address current limitations to image analysis by Deep Convolutional Neural Networks, such as the requirement to learn simple scale and rotation transformations, which contribute to the large computational demands for training and opaqueness of the learned structure, by applying structural constraints based on known properties of the human visual system. We propose to apply a version of the retino-cortical transform to reduce the dimensionality of the input image space by a factor of ex100, and map this spatially to transform rotations and scale changes into spatial shifts. By reducing the input image size accordingly, and therefore learning requirements, we aim to develop compact and lightweight egocentric and robot vision sensor using a smartphone as the target platfor
    • …
    corecore